Superposed speech localisation using frequency tracking
نویسندگان
چکیده
On this paper we present a new approach for the localisation of superposed speech areas. The system is based on the frequency tracking of speech segments following the evolution of the main amplitude frequencies and uses no learning of acoustic or prosodic models. The set of trackings of the frequencies are then grouped together using a distance based on the harmonicity, each group being the production of a single speaker. The co-occurrence of different harmonic groups is then used as a consequence of the presence of multiple speakers. Our method has been evaluated on the data of the French ANR evaluation campaign ETAPE, showing the usability of this approach.
منابع مشابه
Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملParticle Filtering Methods for Acoustic Source Localisation and Tracking
The task of acoustic source tracking plays an important role in many practical speech acquisition systems. This research presents an extensive study of sequential Monte Carlo methods applied to the source localisation problem, based on the signals received at an array of microphones. A general framework for acoustic source localisation using particle filtering is proposed, and four different al...
متن کاملImportance Sampling Particle Filter for Robust Acoustic Source Localisation and Tracking in Reverberant Environments
The concept of acoustic source localisation and tracking (ASLT) plays an important role in many practical speech acquisition systems. Exact knowledge of the speaker position is usually the key to acquiring clean speech using e.g. beamforming or equalisation. Multipath sound propagation in practical environments however constitutes a major challenge to overcome for any array-based tracker. The p...
متن کاملIntegrating pitch and localisation cues at a speech fragment level
This paper proposes a novel speech-fragment based approach for processing binaural data to improve the estimation of speech source locations in reverberant, multi-speaker recordings. The technique employs two stages. First, a robust multipitch tracking algorithm is used to locate local spectro-temporal ‘speech fragments’ – regions where the energy in the mixture is dominated by a single speech ...
متن کاملBinaural sound source localisation and tracking using a dynamic spherical head model
This paper introduces a binaural model for the localisation and tracking of a moving sound source’s azimuth in the horizontal plane. The model uses a nonlinear state space representation of the sound source dynamics including the current position of the listener’s head. The state is estimated via an unscented Kalman Filter by comparing the interaural level and time differences of the binaural s...
متن کامل